Goto

Collaborating Authors

 observable mdp


An Improved Policy Iteration Algorithm for Partially Observable MDPs

Neural Information Processing Systems

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The pa(cid:173) per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the trans(cid:173) formation of a finite-state controller into an improved finite-state con(cid:173) troller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.


Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs

Roijers, Diederik M. (Vrije Universiteit Brussel,  Vrije Universiteit Amsterdam) | Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)

AAAI Conferences

Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.


Learning Policies in Partially Observable MDPs with Abstract Actions Using Value Iteration

Janzadeh, Hamed (The University of Texas at Arlington) | Huber, Manfred (The University of Texas at Arlington)

AAAI Conferences

While the use of abstraction and its benefit in terms of transferring learned information to new tasks has been studied extensively and successfully in MDPs, it has not been studied in the context of Partially Observable MDPs. This paper addresses the problem of transferring skills from previous experiences in POMDP models using high-level actions (options). It shows that the optimal value function remains piecewise-linear and convex when policies are high-level actions, and shows how value iteration algorithms can be modified to support options. The results can be applied to all existing value Iteration algorithms. Experiments show how adding options can speed up the learning process.